AITopics | ood benchmark

Collaborating Authors

ood benchmark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

Neural Information Processing SystemsDec-23-2025, 17:15:23 GMT

Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on yes'' when the common training answer was ``no''.

goodhart, name change, out-of-distribution testing, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

Neural Information Processing SystemsOct-9-2024, 10:18:02 GMT

goodhart, ood benchmark, out-of-distribution testing

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Liu, Litian, Qin, Yao

arXiv.org Artificial IntelligenceNov-6-2023

Out-of-distribution (OOD) detection is essential for the safe deployment of AI. Particularly, OOD detectors should generalize effectively across diverse scenarios. To improve upon the generalizability of existing OOD detectors, we introduce a highly versatile OOD detector, called Neural Collapse inspired OOD detector (NC-OOD). We extend the prevalent observation that in-distribution (ID) features tend to form clusters, whereas OOD features are far away. Particularly, based on the recent observation, Neural Collapse, we further demonstrate that ID features tend to cluster in proximity to weight vectors. From our extended observation, we propose to detect OOD based on feature proximity to weight vectors. To further rule out OOD samples, we leverage the observation that OOD features tend to reside closer to the origin than ID features. Extensive experiments show that our approach enhances the generalizability of existing work and can consistently achieve state-of-the-art OOD detection performance across a wide range of OOD Benchmarks over different classification tasks, training losses, and model architectures. Machine learning models deployed in practice will inevitably encounter samples that deviate from the training distribution. As a classifier cannot make meaningful predictions on test samples that belong to unseen classes during training, it is important to actively detect and handle Out-of-Distribution (OOD) samples. Considering the diverse application scenarios, an effective OOD detector should generalize across classification tasks of different input resolutions, number of classes, classification accuracy, as well as classifiers under different training schemes and architectures. Since Nguyen et al. (2015) reveals that neural networks tend to be over-confident on OOD samples, an extensive body of research has been focused on developing effective OOD detection algorithms.

classifier, neural collapse, weight vector, (11 more...)

arXiv.org Artificial Intelligence

2311.01479

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

He, Jiabang, Hu, Yi, Wang, Lei, Xu, Xing, Liu, Ning, Liu, Hui, Shen, Heng Tao

arXiv.org Artificial IntelligenceJun-5-2023

Numerous pre-training techniques for visual document understanding (VDU) have recently shown substantial improvements in performance across a wide range of document tasks. However, these pre-trained VDU models cannot guarantee continued success when the distribution of test data differs from the distribution of training data. In this paper, to investigate how robust existing pre-trained VDU models are to various distribution shifts, we first develop an out-of-distribution (OOD) benchmark termed Do-GOOD for the fine-Grained analysis on Document image-related tasks specifically. The Do-GOOD benchmark defines the underlying mechanisms that result in different distribution shifts and contains 9 OOD datasets covering 3 VDU related tasks, e.g., document information extraction, classification and question answering. We then evaluate the robustness and perform a fine-grained analysis of 5 latest VDU pre-trained models and 2 typical OOD generalization algorithms on these OOD datasets. Results from the experiments demonstrate that there is a significant performance gap between the in-distribution (ID) and OOD settings for document images, and that fine-grained analysis of distribution shifts can reveal the brittle nature of existing pre-trained VDU models and OOD generalization algorithms. The code and datasets for our Do-GOOD benchmark can be found at https://github.com/MAEHCM/Do-GOOD.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.02623

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback